Spelling Error Patterns in Brazilian Portuguese

نویسندگان

  • Priscila A. Gimenes
  • Norton Trevisan Roman
  • Ariadne M. B. R. Carvalho
چکیده

Fifty years after Damerau set up his statistics for the distribution of errors in typed texts, his findings are still used in a range of different languages. Because these statistics were derived from texts in English, the question of whether they actually apply to other languages has been raised. We address this issue through the analysis of a set of typed texts in Brazilian Portuguese, deriving statistics tailored to this language. Results show that diacritical marks play a major role, as indicated by the frequency of mistakes involving them, thereby rendering Damerau’s original findings mostly unfit for spelling correction systems, although still holding them useful, should one set aside such marks. Furthermore, a comparison between these results and those published for Spanish show no statistically significant differences between both languages—an indication that the distribution of spelling errors depends on the adopted character set rather than the language itself.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unconventional word segmentation in Brazilian children’s early text production

An important element of learning to read and write at school is the ability to define word boundaries. Defining word boundaries in text writing is not a straightforward task even for children who have mastered graphophonemic correspondences. In children’s writing, unconventional word segmentation has been observed across a range of languages and contexts with more occurrences of hyposegmentatio...

متن کامل

Dermatology and the Brazilian Portuguese language orthographic reform.

The Brazilian Portuguese language orthographic reform has promoted changes in writing in less than 2% of its lexis. However, these changes have affected medical practice. The authors present in this article the main changes in the orthographic rules and gather a group of words that have had their spelling altered by this new language reform emphasizing the dermatological terms.

متن کامل

‘Minor’ Languages, ‘Broken’ Translations: On Brazilian Reworkings of an Albanian Novel

This essay approaches the challenges of global translation in the 21st century from what might still be considered a somewhat uncommon example: a direct translation of Ismail Kadaré's 1978 novel Prill e thyër (Broken April) from the original Albanian into Brazilian Portuguese in 2001. Not only does it examine and compare lexical elements in the source and target texts and the usage of translato...

متن کامل

Automatic Detection of Spelling Variation in Historical Corpus: An Application to Build a Brazilian Portuguese Spelling Variants Dictionary

The Historical Dictionary of Brazilian Portuguese (HDBP), the first of its kind, is based on a corpus of Brazilian Portuguese (BP) texts from the sixteenth through the eighteenth centuries (and some texts from the beginning of the nineteenth century), being developed under the sponsorship of the Brazilian funding agency CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico). It is...

متن کامل

Building a Corpus-based Historical Portuguese Dictionary: Challenges and Opportunities

Historical corpora are important resources for different areas. Philology, Human Language Technology, Literary Studies, History, and Lexicography are some that benefit from them. However, compiling historical corpora is different from compiling contemporary corpora. Corpus designers have to deal with several characteristics inherent in historical texts, such as: absence of a spelling standard, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2015